179 research outputs found

    Improved rates for Wasserstein deconvolution with ordinary smooth error in dimension one

    Full text link
    This paper deals with the estimation of a probability measure on the real line from data observed with an additive noise. We are interested in rates of convergence for the Wasserstein metric of order p1p\geq 1. The distribution of the errors is assumed to be known and to belong to a class of supersmooth or ordinary smooth distributions. We obtain in the univariate situation an improved upper bound in the ordinary smooth case and less restrictive conditions for the existing bound in the supersmooth one. In the ordinary smooth case, a lower bound is also provided, and numerical experiments illustrating the rates of convergence are presented

    Statistical learning for wind power : a modeling and stability study towards forecasting

    Full text link
    We focus on wind power modeling using machine learning techniques. We show on real data provided by the wind energy company Ma{\"i}a Eolis, that parametric models, even following closely the physical equation relating wind production to wind speed are outperformed by intelligent learning algorithms. In particular, the CART-Bagging algorithm gives very stable and promising results. Besides, as a step towards forecast, we quantify the impact of using deteriorated wind measures on the performances. We show also on this application that the default methodology to select a subset of predictors provided in the standard random forest package can be refined, especially when there exists among the predictors one variable which has a major impact

    Projection-based curve clustering

    Get PDF
    This paper focuses on unsupervised curve classification in the context of nuclear industry. At the Commissariat à l'Energie Atomique (CEA), Cadarache (France), the thermal-hydraulic computer code CATHARE is used to study the reliability of reactor vessels. The code inputs are physical parameters and the outputs are time evolution curves of a few other physical quantities. As the CATHARE code is quite complex and CPU-time consuming, it has to be approximated by a regression model. This regression process involves a clustering step. In the present paper, CATHARE output curves are clustered using a k-means scheme, with a projection onto a lower dimensional space. We study the properties of the empirically optimal cluster centers found by the clustering method based on projections, compared to the “true” ones. The choice of the projection basis is discussed, and an algorithm is implemented to select the best projection basis among a library of orthonormal bases. The approach is illustrated on a simulated example and then applied to the industrial problem

    On principal curves with a length constraint

    Get PDF
    Principal curves are defined as parametric curves passing through the ``middle'' of a probability distribution in R^d. In addition to the original definition based on self-consistency, several points of view have been considered among which a least square type constrained minimization problem.In this paper, we are interested in theoretical properties satisfied by a constrained principal curve associated to a probability distribution with second-order moment. We study open and closed principal curves f:[0,1]-->R^d with length at most L and show in particular that they have finite curvature whenever the probability distribution is not supported on the range of a curve with length L.We derive from the order 1 condition, expressing that a curve is a critical point for the criterion, an equation involving the curve, its curvature, as well as a random variable playing the role of the curve parameter. This equation allows to show that a constrained principal curve in dimension 2 has no multiple point

    Estimation via length-constrained generalized empirical principal curves under small noise

    Get PDF
    In this paper, we propose a method to build a sequence of generalized empirical principal curves, with selected length, so that, in Hausdor distance, the images of the estimating principal curves converge in probability to the image of g
    corecore